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Abstract 

We consider the stochastic graph model where the location of each vertex is a random point in a 
given metric space. We study the problems of computing the expected lengths of the minimum span- 
ning tree, the minimum perfect matching and the minimum cycle cover on such a stochastic graph and 
obtain an FPRAS (Fully Polynomial Randomized Approximation Scheme) for each of these problems. 
Our result for stochastic minimum spanning trees improves upon the previously known constant factor 
approximation algorithm. Our results for the stochastic minimum perfect matching and the stochastic 
minimum cycle cover are the first known algorithms to the best of our knowledge. 



1 Introduction 



Motivated by the uncertainty inherent in the large graph data generated nowadays by a variety of sources, we 
consider the following fundamental stochastic graph model. We are given a metric space V. The location 
of each node v G V in the stochastic graph Q is a random point in the metric space and the probability 
distribution is given as the input. We assume the distributions are discrete and independent of each other. 
We use p vs to denote the probability that the location of node v is point s € V. The model is also termed 
as the locational uncertainty model in [23]. A special case of this model where all points follow the same 
distribution has been studied extensively in the stochastic geometry literature (see e.g., [7, 9, 8, 24, 29].). 
The model is also of fundamental interests in the area of wireless networks. In many applications, we only 
have some prior information about the locations of the transmission nodes (e.g., some sensors that will be 
deployed randomly in a designated area by an aircraft). Such a stochastic wireless network can be captured 
precisely by this model. See e.g., a recent survey [20] and the reference therein. 

We are interested in estimating the expected length of certain combinatorial objects in the stochastic 
graph model. We need some notations in order to define our problems formally. We use the term nodes (or 
vertices) to refer to the vertices of the graph and points (or locations) the points in the metric space. Denote 
the set of nodes as V = {v±, . . . , v n } and the set of points V = {si, . . . , s m }, where n = \V\ and m = \V\. 
A realization r of the stochastic graph Q can be represented by an n-dimensional vector (ri, . . . , r n ) G V n 
where point r-j is the location of node Vi for 1 < i < n. Let R denote the set of all possible realizations. 
Since the nodes are independent, we can see r occurs with probability Pr[r] = OieH^w m tn i s P a P er > 
we study three classic combinatorial problems in this model: minimum spanning tree (MST), minimum 
length perfect matching (MPM) (assuming an even number of nodes) and minimum length cycle cover 
(CC). Taking the minimum spanning tree problem for example, we would like to estimate the following 
quantity: 

E[MST] = Pr H • MST(r) 

reR 

where MST(r) is the length of the minimum spanning tree spanning all points in r. However, the above 
formula does not give us an efficient way to estimate the expectation since it involves an exponential number 
of terms. 

In a closely related stochastic graph model, the location of a node is a fixed point, but the existence of 
the node is probabilistic (called the existential uncertainty model). Kamousi, Chan and Suri [23] initiated 
the study of estimating the expected length of combinatorial objects in the existential uncertainty model. 
They showed that computing the expected length of the nearest neighbor (NN) graph, the Gabriel graph 
(GG), the relative neighborhood graph (RNG), and the Delaunay triangulation (DT) can be solved exactly in 
polynomial time, while computing E[MST] is #P-hard and there exists a simple FPRAS for approximating 
E[MST]. They also gave a deterministic PTAS for approximating E[MST] in Euclidean plane. They also 
studied the problem of computing E[MST] on the locational uncertainty model. They showed the problem 
is also #P-hard and gave a constant factor approximation algorithm for a special case of the problem. 

1.1 Our Contributions 

We recall that a. fully polynomial randomized approximation scheme (FPRAS) for a problem / is a random- 
ized algorithm A that takes an input instance x a real number e > 0, returns A(x) such that Pr[(l — e)f(x) < 
A(x) < (1 + e)f(x)] > | and its running time is polynomially in both the size of the input n and 1/e. Per- 
haps the simplest and the most commonly used technique for estimating the expectation of a random variable 
is the naive Monte Carlo method, that is to use the sample average as the estimate. However, the method is 
only efficient (i.e., runs in polynomial time) if the variance of the random variable is small (More precisely, 
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we need the ratio between the maximum possible value and the expected value is bounded by a polyno- 
mial. See Lemma 1). To circumvent the difficulty caused by the high variance, a general methodology is 
to decompose the expectation of the random variable into a convex combination of conditional expectations 
using the law of total expectation: 

E[X] = Ey [E[X | Y]] = ^Pr[y = y]E[X \ Y = y]. 

y 

Hopefully, the probabilities Pr[Y = y] can be estimated (or calculated exactly) efficiently, and the random 
variable X conditioning on each event y has a low variance, thus we can estimate the conditional expectation 
efficiently as well using Monte Carlo. However, choosing the right events Y to condition on can be tricky. 
For example, the FPRAS developed in [23] for estimating the expected length of the minimum spanning 
tree in the vertex uncertainty model follows the general conditional expectation methodology. Roughly 
speaking, the events to condition on are of the form "Both s and t are active (present) and t is the furthest 
vertex from s. In fact, conditioning on such an event, it is easy to see that the length of any spanning tree 
is at most nd(s, t) and at least d(s, t). Therefore, by Chernoff bound, we can show the number of samples 
required for obtaining an 1 ± e-estimate for the conditional expectation can be bounded by a polynomial. 
In fact, we also show that the same idea can be extended to give an alternative FPRAS for the minimum 
spanning tree in the locational uncertainty model (Appendix A). However, it is not clear how to extend this 
technique for the minimum perfect matching problem and the minimum cycle cover problem. In particular, 
the ratio between the maximum possible length of any perfect matching (and cycle cover) and the expected 
length can not be bounded by fixing the positions of any constant number of vertices. 

Our FPRASs for all three problems considered in this paper, the minimum spanning tree, the minimum 
perfect matching and the minimum cycle cover, also follow the general methodology. However, the events 
we choose to condition on are quite different from the previous work [23] and are quite indirect, in our 
opinion. Our main contributions and the highlights of our techniques can be summarized as follows: 

1 . (Section 2) We develop a new technique to devise FPRAS for estimating the expected length of com- 
binatorial structures in a stochastic graph. We first demonstrate an application of this technique to 
the minimum spanning tree problem (MST). We obtain an FPRAS for estimating E[MST], which 
improves upon the previously known constant factor approximation algorithm [23]. Note the problem 
is known to be #P-hard [23]. Now, we give a high level sketch of our technique. We first identifies 
a "core" set H of points (we call H the home) such that with probability close to 1, all nodes realize 
to H. Moreover, estimating the expectation conditioning on the event that all nodes realize to H 
can be done using naive Monte Carlo method since we can show the ratio between max MST and 
E[MST] can be bounded by a polynomial. The problematic part is when some nodes realize to points 
outside home. Even though the probability of such events is very small, but the length of MST under 
such events may be considerably large, thus contributing nontrivially to E[MST]. However, we can 
show the contribution of such events is dominated by a subset of events where only one node realizes 
outside home. In other words, the contribution of the events where more than one nodes are outside 
home is negligible and can be safely ignored. Our technique seems more flexible and extendable than 
the previous technique (at least to minimum perfect matching) and we view it as a key contribution in 
this paper. 

2. (Section 3) As a more interesting application of our "home set" technique, we give the first FPRAS (to 
the best of our knowledge) for approximating the expected length of the minimum perfect matching 
(MPM) in a stochastic graph. Our algorithm is technically more involved than the one for MST. We 
assume that there are even number of nodes. There are two major modifications. First, the home 
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set H consists of several clusters of points, so that with probability close to 1, each cluster contains 
even number of nodes. We can also estimate the expectation conditioning on the event that all nodes 
realize to H using the Monte Carlo method. Second, in order to show that the contribution of the 
events where more than one nodes are out of home is negligible, we need several structural properties 
of perfect matchings and a more careful charging argument. 

3. (Section 4) We show that the problem of computing the expected length of the minimum cycle cover 
(CC) in a stochastic graph admits an FPRAS. We allow cycles with two nodes. It is the first known 
algorithm for this problem to the best of our knowledge. The event we choose to condition on is of 
the form " Edge e is the longest edge in the nearest neighbor graph (NN)". Even though NN can be 
very different from CC, we show that, interestingly, by conditioning on such events, estimating CC 
becomes easier in most cases. In some cases, estimating CC is still difficult, but we can show the 
contribution of those cases is negligible. This is done by noticing a relationship between the length 
of NN and that of CC. Our algorithm can be extended to handle the case where the existence of each 
node is uncertain and/or each cycle is required to contain at least three nodes. 

In the appendix, we show that in fact the algorithm developed in [23] can be modified into an FPRAS 
for estimating E[MST] in the locational uncertainty model (even though only a constant approximation for 
a special case was claimed explicitly in that paper). However, it is not clear how their approach can be 
extended to perfect matching and cycle cover. All of our algorithms run in polynomial times. However, we 
have not attempted to optimize the exact running times. 

1.2 Related Work 

Several geometric properties of a set of stochastic points have been studied extensively in the literature under 
the term stochastic geometry. For instance, Bearwood et al. [7] shows that if there are n points uniformly and 
independently distributed in [0, l] 2 , the minimal traveling salesman tour visiting them has an expected length 
Cl(y/n). Asymptotic results for minimum spanning trees and minimum matchings on n points uniformly 
distributed in unit balls are established by Bertsimas and van Ryzin [9]. Similar results can be found in 
e.g., [8, 24, 29]. Compared with results in stochastic geometry, we focus on efficient computation of the 
statistics, instead of giving explicit mathematical formulas for them. 

Recently, a number of researchers have begun to explore geometric computing under uncertainty and 
many classical computational geometry problems have been studied in different uncertainty models. Agar- 
wal, Cheng, Tao and Yi [2] studied the problem of indexing probabilistic points with continuous distributions 
for range queries on a line. Kamousi, Chan and Suri [22] studied the closest pair and (approximate) nearest 
neighbor problems (i.e., finding the point with the smallest expected distance from the query point) in the 
existential uncertainty model. Agarwal, Efrat, Sankararaman, and Zhang [3] also studied the same problem 
in the locational uncertainty model under Euclidean metric. The most probable /c-nearest neighbor problem 
and its variants have attracted a lot of attentions in the database community (See e.g., [12, 10, 26]). Sev- 
eral other problems have also been considered recently, such as computing the expected volume of a set of 
probabilistic rectangles in a Euclidean space [32], skylines (Pareto curves) over probabilistic points [6, 1], 
and shape fitting [28]. Instead of using probability theory, an alternative model to capture the uncertain is 
the robust model, where each point is assumed to lie in some uncertain region and we are interested in the 
extreme values of the combinatorial objects. For a comprehensive treatment of this model, see Loffler's 
thesis [27] and the references therein. 

The randomly weighted graph model where the edge weights are independent nonnegative variables 
has also been studied extensively. Frieze [17] and Steele [30] showed that the expected value of the mini- 
mum spanning tree on such a graph with identically and independently distributed edges is ((3)/D where 
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C(3) = YlJLi l/j 3 an( i D is the derivative of the distribution at 0. Alexopoulos and Jacobson [4] devel- 
oped algorithms that compute the distribution of MST and the probability that a particular edge belongs to 
MST when edge lengths follow discrete distributions. However, the running times of their algorithms may 
be exponential in the worst cases. Recently, Emek, Korman and Shavitt [16] showed that computing the 
kth moment of a class of properties, including the diameter, radius and minimum spanning tree, admits an 
FPRAS for every fixed k. Our model differs from their model in that the edge lengths in our model are not 
independent. 

The computational/algorithmic aspects of stochastic geometry have also gained a lot attentions in recent 
years from the area of wireless networking. In many application scenarios, it is common to assume the 
nodes (e.g., sensors) are deployed randomly across a certain area, thereby forming a stochastic network. It 
is of central importance to study various properties in this network, such as connectivity [18], transmission 
capacity [19]. We refer interested reader to a recent survey [20] for more references. 

2 Minimum Spanning Trees 

In this section, we assume the presence of each node is certain but its location is stochastic. We use the term 
nodes (or vertices) to refer to the vertices V of the spanning tree and points (or locations) the points in the 
metric space V. We have |V| = n and \V\ = m. We first assume the distribution of the location of each node 
is discrete. For any node v £ V and point s £ V, we use the notation v 1= s to denote the event that node v 
is present at point s. Let p vs = Px[v 1= s], i.e., the probability that node v is present at point s. Since node v 
is present with certainty, we have J2 s eV P vs = ^ - ^ or a P°i nt s > we l et p( s ) to denote the expected number 
of nodes realized at s, i.e., ^2 veV p V s- For a set H of points, let p(H) = J2seH p( s )> i- e -' tne expected 
number of points realized in H. For a set H of points and a set S of nodes, we use H(S) to denote the event 
that all and only nodes in S are realized to some points in H. If S only contains one node, say v, we use 
the notation H(v) as the shorthand for H({v}}. Let H(i) to denote the event Vs-|S|=i H{S)> i- e -> tne event 
that exactly i nodes are in H. We use diam(i?), called the diameter of H, to denote max S! t e # d(s, t). Let 
d(p, H) denote the closest distance between point p and any point in H. 

2.1 The Naive Monte Carlo Method 

Before describing our algorithm, We first consider the naive Monte Carlo strategy, which is an important 
building block in our later developments. In each Monte Carlo iteration, we take a sample (a realization of 
all nodes), compute the length of the MST on the sample. At the end, we output the average MST lengths 
of all samples. The number of samples required by this algorithm is suggested by the following standard 
Chernoff bound. 

Lemma 1 (Chernoff Bound) Let random variables X\ , X2 , ■ ■ ■ , X n be independent random variables tak- 
ing on values between and U. Let X = Y27=l -^i an d fJ> be the expectation of X, for any e > 0, 

Pr [X £ [(1 - e)fi, (1 + e)/i]] > 1 - 2e~ N u^/ 4 . 

Therefore, for any e > 0, in order to get an (1 ± e) -approximation with probability 1 — pol *^ , the number 

of samples needs to be 0(-^ log{n}). If ^, the ratio between the maximum possible length of any MST 

and the expected length E[MST], is bounded by poly(m, n, -) we can use the above Monte Carlo method 
to estimate E[MST] with a polynomial number of samples. Since we use this condition often, we devote a 
separate definition to it. 
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Definition 1 We call a random variable X a nice instance if the ratio between the maximum possible value 
of X and the expected value K[X] is bounded by poly(m, n, -). 

2.2 Our FPRAS for MST 

We first give a high level overview of our technique. Following the general conditional expectation method- 
ology, we break E[MST] into a linear sum of conditional expectations. The events we choose to condition 
on depends on the notion of home, which is a set H of points with two nice properties: (1) with proba- 
bility close to 1, all vertices are realized in home H, and (2) the ratio between the diameter of H and the 
expected length of MST conditioning on that all nodes are at home is bounded by a polynomial. Each event 
is of the form H(i) (i.e., exactly i nodes are realized in H) for some i > 0. Thus, it suffices to estimate 
Pr[iT(z)]E[MST | H(i}] for each i. However, our final estimation only consists of the first two terms: 
Pr[# (n)]E[MST | H(n}] and Pr[H{n - 1)]E[MST | H(n - 1)]. We can show that the contribution from 
the rest of terms (where more than one nodes are outside home) is negligible and can be safely ignored. To 
estimate the first term. We use the second property of H which guarantees that MST conditioning on H (n) 
is a nice instance. The second term can be estimated similarly. 

The details of our estimation algorithm are as follows. First, we find in poly-time a set H of points (see 
Lemma 3 below) such that the following two properties hold: 

PI. p(H) > n - and 

P2. E[ MST | H(n}] = ^(diam(iT) ^). 

We call H the home of all nodes (due to the first property). We note that H depends on the error parameter 
e. Let F = V \ H. By the law of total expectation, the expected length of the minimum spanning tree can 
be expanded as the following: 

E[MST] = ^ E[ MST | F(i) } ■ Pr[F(*)]. 

Interestingly, we can show that the contribution of all terms except the first two is negligible (in Lemma 5). 
Therefore, it suffices to focus on estimating the first two terms 

E[ MST | H(n)] ] ■ Pr[H{n)] and E[ MST | ] • Pr[F(l)]. 

2 

Now, we present the details of how to get al± e-estimate for both terms with 0( m ^- In n) samples. 
Estimating the first term : Due to (P2), we have a nice instance and can therefore obtain al± e-estimate for 

2 

E[ MST | H(n) ] using the Monte Carlo method with 0(^^- Inn) samples satisfying H(n) (by Lemma 1). 
To obtain samples satisfying H(n) efficiently, we simply use the rejection sampling method, i.e., rejecting 
all samples for which H(n) = false. By the first property of H, with probability close to 1, a sample satisfies 
H(n). So, the expected time to obtain an useful sample is bounded by a constant. Overall, we can obtain a 

2 

1 ± e-estimate of the first term with using 0{ m ^- Inn) samples with high probability. 
Estimating the second term : To compute the second term, we first rewrite it as follows: 

E[ MST | F{1) } ■ Pr[F(l)] = ^ E[ MST | F(v) } Pr[F(v}] 

= J2(^2Pt[F{v) Av N s]E[MST I F(v),v \= s] ) 
vev seF 

Fix a node v. To estimate J2 S &F ~P*[F(v) Av\=s] E[MST | F(v) , v t= s], we break it into two parts: 
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1. We first estimate the sum Y^s:d(s,H)<^-diam(H) P*[F{v},v 1= s]E[MST | F(v),v \= s]. Let CI (v) be 
the event that v is the only node that realizes to some node s H and d(s, H) < 2 . diam(i7). Notice 
that the sum is in fact Pr[CI(u)] • E[ MST | C\(v )]. We can see that Pr[CI(i>)] can be computed exactly 
in linear time. Our estimate of E[ MST | C\(v)] is the average of 0(j\ Inn) samples (the samples 
are taken under the condition Q\{v)). We argue the quality of this estimation is good by considering 
the following two cases: 

(a) Assume that E[ MST | Cl(u)] > |E[ MST | H(n)] > Q^^jdiam(H). In this case, we have 

a nice instance. This is because under the condition C\(v ), the maximum possible length of any 
minimum spanning tree is 0( jdiam(H)). Hence we can use Monte Carlo to get a (1 ± e)- 
approximation of E[ MST | C\(v)j with 0(^- Inn) samples. 

(b) Otherwise, we assume that E[ MST | C\(v)} < |E[ MST | H(n)]]. The probability that the 
sample average is larger than E[MST | H(n)]] is at most poly(^) by Chernoff Bound. The 
probability that for all nodes v, the sample average are at most E[MST | H(n}]] is at least 
1 — poly(^) by union bound. If this is the case, we can see their total contribution to the final 
estimation of E[MST] is less than eE[ MST | H(n)]]Pr[H{n)]. In fact, this is because 

5^Pr[CI(u)]-E[MST | Cl(u)] < ^ Pr[CI(u)]-E[ MST | H(n)]} < eE[ MST | H(n)]]Pt[H(n)]. 

The first inequality is due to the fact that J2vev P r [CI(f )] < n — p(H) < e/16 < ePr[H(n)]. 

2. In the other part, each term has d(s, H) > - • diam(^T). We just use d(s, H) as the estimation of 
E[MST | F(v),v 1= a]. This is because the length of MST is always at least d(s, H) and at most 
d(s, H) + n • diam(^) < (1 + e)d(s, H). 



2.3 Finding Home 

The remaining task is to show how to find the home H. We need the following simple lemma. 

Lemma 2 Consider two points s and t in V. Suppose no node contributes to more than one half of both 
p(s) and p(t) (i.e., jBv £ V, s.t. p vs > 0.5p(s) and p v t > 0.5p(t)). Then, we have that 

Pv[3v ^ u, v \= s, u 1= t] = £l(p(s)p(t)) . 

Proof: According to the given conditions, we have that 

PvsPvt 1 f Pvs 2lL\ 2 < 3 ( , Pvt \ 

p(s)p(t) - A\p(s) p(t)J -8\p(s) p(t)J 

Then, we can see that 

Pi[3v / u,v \= s,u\= t] = p(s)p{t) - ^2 PvsPvt > p(s)p{t) - ^2 qP( s )p(*) (^TT + -ttt) ^ jP&Pit) 

The last inequality holds since both and ^ are at most 1/2. □ 
With this lemma at hand, finding the home H is not difficult, as shown in the next lemma. 
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Lemma 3 There is a set H of points such that 
PL p(H) >n-^ = n- 0(e), and 

P2. E[MST | H(n)] = fi(diam(iT)^V 
Furthermore, we can find such H in linear time. 

Proof: For each ordered pair of points (s,t), consider H st = B(s,d(s,t)), the ball centered at s with 
radius d(s,t). Consider the furthest two points among all points r with p(r) > Suppose the two 

points are s and t. For each point r that is not in H st , we know p(r) < Therefore, we have that and 
p(V \ H st ) < jq. wdp(H st ) > n — jq. Consider two cases: 

1. There is no node v G V such that p vs > 0.5p(s) and p vt > 0.bp(t). In this case, by Lemma 2, we 
have that 

E[MST I H st (n)} > d(a,t)Pr[3u / a,# s,tiNf]> n(d(a,t)^—). 

2. There is a node v such that p vs > O.bp(s) andp ut > 0.bp(t). In this case, conditioning on the event 
that a different node u is realized to an arbitrary point q 

E[MST | H st (n)} > d(s, q)Pr[v Ns]+ d(t, q)Pv[v N t] > d(s, t)-£—. 

32m 

In either case, H st satisfies both PI and P2. □ 



2.4 Analysis of the Performance Guarantee 

Now, we analyze the performance guarantee of our algorithm. We need to show that the total contribution 
from the scenarios where more than one nodes are not at home is very small. We need some notations first. 
Suppose S is the set of nodes out of home H. We use Fs to denote the set of all possible realizations 
of all nodes in S to points in F (we can think each element in Fs as a |,S| -dimensional vector where 
each coordinate is indexed by a vertex in 5 and its value is a point in F). Similarly, we denote the set of 
realizations of S = V \ S to points in H by T~L§. For any F$ G Fs and H§ £ T~L§, we use (Fs, Hg) to 
denote the event that both Fs and H s happen and MST^s, H s ) the length of the minimum spanning tree 
under the realization (Fs, H§). We need the following combinatorial fact. 

Lemma 4 Consider a particular realization (Fg,H s ) where S is the set of nodes out of home H. \S\ > 2. 
The realization (Fgi, H S /) is obtained from (Fs, H s ) by sending home the node that is outside H but closest 
to any node in H s . Then MSJ(F S , H s ) < 4MST(F 5 /, H s ,). 

Proof: For (Fg, H§), Let d = mm vG F s ,ueH s {d(u, v)}. Then we have 

2MST(F S ,,H S ,) > MSJ(F sl ,H sl ) + d>^MSJ(F S ',Hs) + d>^MSJ(F s ,H 3 ) 

The second inequality holds since the length of the minimum spanning tree is at most two times the length of 
the minimum Steiner tree (We can think MST(F£/, H s ,) as a Steiner tree connecting all nodes in Fgi L)H S ). 

□ 

The following lemma is essential in establishing the performance guarantee. 
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Lemma 5 For any e>0,ifH satisfies the properties in Lemma 3, we have that 

^E[MST | F{i)] ■ Pi[F{%)] < e • E[ MST | F(l)] ■ Pr[F(l)]. 

i>l 

Proof: We claim that for any i > 1, 

E[ MST | F(i + 1)] • Pr[F(» + 1)] < |e[ MST | F{i)] ■ Pr[F{i)\. 
If the claim is true, then we can show the lemma easily by noticing that, for any n > 2, 

n-l 

E^[MST | F(i)]Pi[F(i)] <J2(-y E i MSJ I i? ( 1 )] Pr t F ( 1 )] < eE[MST | F(l)]Pr[F(l)]. 

i>l i=l 

First, we rewrite the LHS as 

E[MST|F(* + l)].Pr[F(t + l)]= E E E {Pr[(F s , H § )} ■ MST (F s , H B ) ) . 

\S\=i+iF s eF s H s eH § 

Similarly, we have the RHS written as 

E[ MST | F(i}] ■ Pv[F(i)} = E E E {^[(F S ,H S )] ■ MST (Fs',R §l )). 

\S'\=iF s ,€T s , HgreHg, 

For each pair (F s , H s ), let C(F S , H s ) = PrfS 1 t= F s A S t= H§] ■ MST(F 5 , H § ). Think each pair (F s , H § ) 
with \S\ = i + 1 as a seller and each pair (Fg>, Hg,) with \S'\ = i as a buyer. The seller (Fs,Hg) want 
to sell the term C{F$, H§) and the buyers want to buy all these terms. The buyer (Fg/, Hg,) has a budget 
of C(Fs>, H s ,). We show there is a charging scheme such that every term C{Fg, Hg) is fully paid by the 
buyers and each buyer spends at most an ^ fraction of her budget. Note that the existence of such a charging 
scheme suffices to prove the lemma. 

Suppose we are selling the term C(F$, Hg). Consider the following charging scheme. Suppose v G S 
is the node closest to any node in Hg. Let S' = S\v and F$> be the restriction of Fs to all coordinates in S 
except v. We say (F5/, H s ,) is consistent with (Fs, H s ), denoted as (F5/, H s ,) ~ (Fs, F[ s ), if H s agrees 
with H s , for all vertices in S. and Fs agrees with Fg> for all vertices in S". Intuitively, H s ,) can be 
obtained from (Fs, H s ) by sending v to an arbitrary point in the home. Let 

Z(F S ,H- S )= E Pt[(F 3 >,H § ,)]. 

(F S ,,H S ,)~(F S ,H S ) 

For each buyer (Fs>, H s ,) ~ (Fs, H s ), we charge her the following amount of money 

Pi[(F s ,,H 5 ,)] 
Z(Fs,H § ) C{Fs ^ 

It is easy to see that C(Fs, H s ) is fully paid by all buyers consistent with (Fs, H s ). It remains to show that 
each buyer (Fs>,H s ,) has been charged at most ^C(Fs>,H s ,). By the above charging scheme, the terms 
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in LHS that are charged to buyer (Fs',Hg,) are consistent with (Fs',H§,). Therefore, the total amount 
charged to buyer (Fs<,H§,) is 

^ Pt[F s ,,H § ,] 

(F S ,H S )~(F S ,,H 3 ,) y SJ 

<MST(F S ,,H § ,). Yl ^ZiF^H 3 ) ^ 3 ^^ 

= 4MST(F s ,,Hg,)Pr[F s ,,H 8 ,]- ]T zffi'ff-] 

(F s ,H g )~(F st ,Hg,) { S ' S> 

<mST(F s ,,H § ,)Pr[F sl ,H §l ] ■ £ ^ | g 

<|MST(F 5 /,# ff ,)Pr[F^# s ,] 

The first inequality follows from Lemma 4. To see the second inequality, for a fixed vertex v, consider the 

quantity Y1(f s h s )~(f , H s ,) s=S'\{v} P z(Fs 'h § ) ' ^ me definition of Z, we can see that the denominators 
of all terms are in fact the same. Canceling out the same multiplicative terms from the numerators and the 
denominator, we can see it is at most ^fefjP ■ □ 

In sum, we obtain the following theorem. 

Theorem 1 There is an FPRAS for estimating the expected length of the minimum spanning tree in a 
stochastic graph. 

Finally, we note that we can use the 0(npoly(l/e)poly log n) time algorithm to estimate the 1 + e- 
approximate value of the minimum spanning tree [15], instead of the exact algorithms, in a general metric 
graph. For Euclidean spaces with O(l) dimensions, we can use the 0(v^poly(l/e)polylogn) time algo- 
rithm in [14] for computing such an approximate value (under certain assumptions) or the 0(rapoly(l/e)) 
time algorithm in [11]. 



3 Minimum Perfect Matchings 

In this section, we consider the minimum perfect matching problem. We assume the number of nodes, n, 
is even. For a node v and a set H of points , let p v (H) = J2 S £hPvs- F° r two sets Hi and H2 of points, 
let d(Hi, H2) = mm sg # l5 tGif 2 {d(s , t)}. We use MPM to denote the length of the minimum length perfect 
matching. Our goal is to estimate E[MPM]. 



3.1 Our FPRAS for MPM 

Our algorithm for MPM follows the same framework: We first identify the home such that the conditional 
expectation conditioning on all nodes are at home can be estimated using the Monte Carlo method. We 
can similarly show that the contribution from the scenarios where more than one nodes are outside home is 
negligible. Thus, we only need to estimate two parts: (1) the expectation conditioning on that all nodes are 
at home, and (2) the expectation conditioning on that only one node is not at home. There are two major 
differences from the algorithm for MST. First, the home set is composed by several clusters of points, 
instead of a single ball. Second, we need a more careful charging argument. 
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Now, we present the details of our algorithm. First, we find a collection of sets of points H\ , . . . , 
such that the following properties holds. 

Ql. For each node v, there is a (unique) set Hj such that p v (Hj) > 1 — 0(—^- s ). We call Hj the home of 
node v, denoted as H(v). 

Q2. For each ball Hj, the number of nodes v with Hj as its home (i.e., H(v) = Hj) is even. 
Q3. E[MPM] = where D = max^diam (#;)}■ 

Let H = UiHi and F = V\H. We use H(n) to denote the event for all n nodes v,v \= H(v). We denote 
the event that there are exactly i nodes which realize out of their homes by F(i). By the previous discuss, 
we only focus on estimating two terms: E[MPM | H(n)}] • Pr[F(n)] and E[MPM | F(l)} ■ Pr[F(l)]. 
Estimating the first term : Note that Pi[H(n)] is close to 1 (by union bound) and can be computed exactly. 

' 2 5 

To estimate E[MPM | H(n)]], we take the average of 0( n ™ Inn) samples. We distinguish the following 
two cases. 

1. E[MPM | H{n}} > §E[MPM] > fi(^). We could get a (1 ± e) -approximation using the Monte 

2 5 

Carlo method with 0{ n ™ Inn) samples. This is because the maximum possible MPM length is at 
most nD and therefore we have a nice instance. 

2. E[MPM | H(n)] < §E[MPM]. Then the probability that the sample average is larger than eE[MPM] 
is at most poly(^) by Chernoff Bound. We can thus ignore this part safely. 

Estimating the second term : We rewrite the second term as follows: 

E[MPM | F(l)] • Pr[F(l}] = X/( Yl Pt I F ( v ) a v n s 1 E[MPM | F(v), v N s] ) 

Fix a particular - node v. We break the sum into two parts as in the previous section: 

E«d(«,H(«))<2.dia B i(H) Pr [ jr < u >» t; N s]E[MPM | F(v),v (= s] and E s: d( s> H(^))>a.diam(^) Pr[F{v),v t= 
s] E[ MPM | F(v), v \= s]. For the first part, we use Monte Carlo and for the second part, we use d(s, H(v)) 
as the estimate of E[MPM | F(v),v \= a]. The details are exactly the same as in the previous section and 
omitted here. 

3.2 Finding Homes 

What remains now is to show how to find the home sets Hi, ... , Hk in poly-time. We need the following 
lemma which is useful in bounding E[MPM] from below. 

Lemma 6 For any two disjoint sets Hi and H2 of points, and any node v, we have 

E[MPM] > ri*{Pvm,MH2)} . d{Hl)Hi)m 
m 

Proof: Suppose s = arg max s / {p vs > \ s' G H{\, and t = arg maxj/ {p vt > \ t' £ H2]. Obviously, we have 
Pvs > E2 ^ L and p vt > So it suffices to show E[MPM] > m\n{p vs ,p vt } ■ d(s, t). We first see that 

E[MPM] > Pvs E[MPM I v N s] +p vt E[MPM \v\=t] 

> mm{p vs , Pvt }(E[MPM \ v 1= s] +E[MPM | v N t]). 
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Then it is sufficient to prove that E[MPM | v 1= s] + E[MPM | v 1= t] > d(s, t). Fix a realization of all nodes 
except v and condition on this event. Consider the two minimum perfect matchings, one for the case v N s, 
(denoted as MP Mi) and the other one for v 1= t (denoted as MPM2). Consider the symmetric difference 

MPMi MPM 2 . 

We can see that it is a path (s,pi,p2, ■ ■ ■ ,Pk, t), such that (a, pi) G MPMi,(pi,p2) 6 MPM2, . . . , (pk, t) G 
MPM 2 . So MPMi + MPM 2 > d(s,t) by the triangle inequality. Therefore, we have E[MPM | v 1= 
s] + E[MPM \v\=t}> d(s, t). □ 

Now, we are ready to show how to find the home sets in polynomial time. 
Lemma 7 We can find in poly-time disjoint point sets Hi, . . . , such that 
QL For each node v, there is a unique ball Hj such that p v (Hj) > 1 — 0(— S)/ 
Q2. For all j, \{v £ V \ H(v) = Hj}\ is even; and 
Q3. E[MPM]=n(^). 

Proof: We gradually increase t, starting from 0. Consider the balls B(s, t) for all points s in V. Initially, 
each ball is a singleton component. As we increase t, if two different components intersect, we merge them 
into a new component. Consider the first time T such that Ql and Q2 are satisfied by those components. 
Let those components be Hi, ... , H^- Note that such T must exist, because the set of all points satisfies the 
first two properties. Now, we show the Q3 also holds. 

Recall D = maxj diam(ilj). Firstly, note that D < 2mT. Secondly, consider T' = T — e for some 
infinitesimal e > 0. At time T' , consider two situations: 

1. There exists a node v, such that \/j,p v (Hj) < 1 — 0(— rs). Then there must exist two components 
Ci and C2 such that p v (Ci) > SO an d Pv{C2) > 0(— SO- Moreover, since Cy and C 2 are two 
distinct components, d(Ci,C 2 ) >?2T'. Then, by LemmaT, we have E[MPM] > O(-S) • 2T > 

v nm° ' 

2. Suppose the Ql is true but Q2 is still false. Suppose Hj is a component which homes odd number of 
nodes. We note that with probability at least (1 — ^?) n ~ 1, every node realizes to a point in its 
home. When this is the case, there is at least one node in Hj that needs to be matched with some node 
outside Hj, which incurs a cost of at least 2T. □ 



3.3 Analysis of the Performance Guarantee 

We show for i > 1, the contribution from event F(i) is negligible. We need the following structural result 
about minimum perfect matchings, which is essential for our charging argument. 

Suppose 5 is the set of nodes that are out of their homes. We use T-g and T-Lg to denote the set of 
all realizations of the all nodes in S to points in F, and the set of realizations of S = V \ S to points 
in H respectively. We use MPM(F$, H§) to denote the length of the minimum perfect matching under 
the realization (F5, H§). The following combinatorial fact plays the same role in the charging argument 
as Lemma 4 does in the previous section. Different from the MST problem, we can not achieve a similar 
bound to the one in Lemma 4 since MPM(Fs, Hg) may decrease significantly if we only sending only one 
node outside home back to its home. However, we show that in such case, if we send one more node back 
home, MPM(Fs, H§) can still be bounded. 
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Lemma 8 Fix a realization (Fs,H§). We use Z(v) to denote d(v,H(v)) for all nodes v G S. Suppose 
v% £ S has the smallest I value and V2 has the second smallest I value. Let S' = S \ S" = S' \ {v 2}. 
Further let (Fg>,H S /) be a realization obtained from (Fs, H§) by sending v\ to a point in its home H{v\) 
and (Fs", H§") ^ e a realization obtained from (F$>, H s ,) by sending v-i to a point in its home H{v-i). Then 
we have that 

MPM(F S , H s ) < 2(m + 2)MPM(F 5 ,, H s> ) + 2(m + 2)MPM(F S //, H s „) 

Proof: Let d = min^ £(v) and D = maxj diam(flj). Note that d > — as d > 2T and D < 2mT. We 
distinguish the following three cases: 

1. MPM(F,s, H§) < |. Using a similar argument to the one in Lemma 6, we have 

MPM{Fsi,H g ,) + MPM(F S ,H§) > £(v) = d 
So, we have MPM(F S ,H § ) < MPM(F 5 /, H s ,) in this case. 

2. MPM(Fs, H§) > (m + 2)d. By the triangle inequality, we can see that 

MPM(F 5 , , Hg,) + (m + l)d > MPM(F g /, H s .) + d + D > MPM(F S ,H S ) 
So, we have MPM(F S , H s ) < (m + 2)MPM(F S /, H s ,). 

3. f < MPM{F S ,H § ) < (m + 2)d. 

(a) MPM(F S ,,H S ,) > |. We directly have MPM(Fs, H s ) < 2(m + 2)MPM(F 5 /, H s ,). 

(b) MPM(F s >,Hg,) < |. By Lemma 6, we have 

MPM(F S /,^,) + MPM(F S //,i?5„) > d 
Then we have MPM(F 5 , H s ) < 2(m + 2)MPM(F S //, i7 5 „). 

So we prove the lemma. □ 

What remains is to establish the following key lemma. The proof is similar to, but more involved than 
that of Lemma 5. 

Lemma 9 For any e>0,ifH satisfies the properties in Lemma 7, we have that 

^E[MPM I F(i}] ■ Pr[F(i)] < e • E[MPM | F(0)] ■ Pr[F(0)] + e • E[MPM | F(l>] • Pr[F(l)]. 
i>l 

Proof: We claim that for any z > 1, 

E[MPM I F(i + 1)] ■ Pr[F(i + 1)] < ^(E[MPM | F(i)] • Pr[F(i)] + E[MPM | F(i - 1}] • Pr[F(i - 1)]) 

If the claim is true, the lemma can be proven easily as follows. For ease of notation, we use A(i) to denote 
E[MPM I F(i)] ■ Pr[F(i)]. First, we can see that 

A(i + 2) + A(i + 1) < e -A(i + 1) + ^A(i) + e -A{i - 1) < + A(i - 1)). 
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So if i is odd, A(i + 2) + A(i + 1) < (§)( i+1 )/ 2 (,4(l) + A(0)). Therefore, £ iM < ^^(^(l) + 
^4(0)) < e(^4(l) + ^4(0)). Now, we prove the claim. Again, we rewrite the LHS as 

E[MPM | F(i + 1)] • Pi[F(i + 1)] = £ £ £ ( Pr[F 5 , Hg] ■ MPM(F S , Hg) ) . 

\S\=i+l F S Hg 

Similarly, we have the RHS to be 

E[MPM|if(*)].Pr[F(t)]= EE( Pr ^,^]-MPM(^,^,)) 

\S'\=i Fg, Hg, 

E[MPM | F(i - 1)] • Pr[F<* - 1)] = ^ ^ ^(Pr^,^] • MPM(F 5 „ , H -„) ) 

|5"|=«-l F s" H s" 

Let C(F s ,Hg) = Pr[F s ,Hg] • MPM(F s ,Hg). Think all (F S ',Hg r ) with \S'\ = i and all (F S ",Hg„) 
with \S | = i — 1 as buyers. The buyers want to buy all terms in LHS. The budget of buyer (Fg', Hg,) 
is C(Fgi, Hg,). We show there is a charging scheme such that every term C(Fg, Hg) is fully paid by the 
buyers and each buyer spends at most an ^ fraction of her budget. 

Suppose we are selling the term C(Fg, Hg). Consider the following charging scheme. Suppose v G S 
the node that realizes to point / G F$ which is the closest point to Hg in Fg. Suppose i £ S the node that is 
realized to point F t G F$ which is the second closest point to Hg in Fg. Let S' = S\ {v\}, S" = S'\ {v 2}. 
If (Fg,, Rg,) is obtained from (Fg, Hg) by sending v\ to a point in its home H(v{), we say (Fg,,Rg,) is 
consistent with (Fg,Rg), denoted as (Fg>,Rg,) ~ (Fg,Rg). If (Fg»,Rg„) is obtained from (Fg/,Hg,) 
by sending V2 to a point in its home H(v2), we say (Fs»,Rg„) is consistent with (Fs>,Rg,), denoted as 
(F s ,,Rg,)~(F s ,Rg). Let 

Z(F s ,Hg)= ]T PT[(Fg,,Hg,)], and Z(F s ,,Hg,)= PrlFs»,Hg„] 

(F s ,,Hg,)~(F s ,Hg) (F s „ ,H g//)~(F s ; ,H s ,) 

For each buyer (Fs>,Hg,) ~ (Fg, Hg), we charge (Fs',Hg,) the following amount of dollars 

Pr ' ffi Pv[F s , Hg] • 2(m + 2)MPM (Fg, , Hg, ) 
and we charge every buyer (Fg,,,Hg„) consistent with (Fs',Hg,) the following amount of money 

TT^Trf • W^ PT W> H ^ ■ l(rn + 2) M P M (Fg„,Hg„) 
z {^S,Hg) Z{b S /,Hg,) 

In this case, we say (Fg,/,Hg„) is a sub-buyer of the term C(F$,Hg). By Lemma 8, we can see that 
A(Fs, Hg) is fully paid. To prove the claim, it suffices to show that each buyer (Fg>, Hg,) and each sub- 
buyer (Fg", Hg„) has been charged at most ^A(F$', Hg,) dollars. By the above charging scheme, the terms 
in LHS that are charged to buyer (Fg>, Hg,) are consistent with (Fg/, Hg,). Using the same argument as in 
the previous section, we can show the spending of (Fg,, Hg,) as a buyer is at most 

MPM(Fg>, Hg, ) Pr [Fg' , Hg, ] . 



13 



The spending of (Fs»,Hg„) as a sub-buyer can be bounded as follows: 

£ TO? £ |«j-^»,^]-2(». + 2)MPM( Fs „,^) 

L F S,H S )~(F S ,,H S ,) (F s , ,Hg,)~(F s „ ,Hgir) 

= 2 (m + 2)MPM(^,^„)Pr[^,^]- £ |P#J £ 

(F S ,H § )~(F S „H §I ) { S > Sl (F s ,,H § ,)~(F s n,H §ll ) { S ' ' 5 ' j 

<2(m + 2)MPM(F 5 „,^„)Pr[F 5 »,^]. £ £ ^gfj 

(F s ,Hg)~(F s ,,H s ,)(F sl ,H §/ )~(F sll ,H sll ) { 5 '' ^'J 

< 2(m + 2)MPM(F 5 „, fl^)Pr[J^, ff s „] • m 2 £ ^^f^j 
<2(m + 2)MPM(F 5 „,^„)Pr[ J F 5 ,,^].m 2 £ | 

< ^MPM(F s »,i/ 5 „)Pr[F s »,^„] 

D 

Note that for each (F^/, ff ), there are at most m 2 different (Fs, H§) such that (Fs, H§) ~ (Fs',Hgt). So 
we have the second inequality. The third inequality can be seen by canceling out same multiplicative terms 
from the numerators and the denominators, as in Lemma 5. □ 

Therefore, we have obtained the main theorem in this section. 

Theorem 2 Assuming there are even number of vertices in the stochastic graph, there exists an FPRASfor 
estimating the expected length of the minimum perfect matching. 



4 Minimum Cycle Covers 

In this section, we consider the minimum length cycle cover problem. In the deterministic version of the 
cycle cover problem, we are asked to find a collection of vertex-disjoint cycles such that every vertex is in 
one cycle and the total length is minimized. Here we assume that every cycle contains at least two nodes. 
If a cycle contains exactly two nodes, the length of the cycle is two times the distance between these two 
nodes. The problem can be solved in polynomial time by reducing the problem to a minimum bipartite 
perfect matching problem 1 W.l.o.g., we assume no two edges in V x V have the same length. For ease of 
exposition, we assume that for each point, there is only one node that may realize at this point. In principle, 
if more than one nodes may realize at the same point, we can create multiple copies of the point co-located 
at the same place, and impose a distinct infinitesimal distance between every pair of copies, to ensure no 
two edges have the same distance. 

We need the notion of the nearest neighbor graph, denoted by N N . For an undirected graph, an edge 
e = (u, v) is in the nearest neighbor graph if u is the nearest neighbor of v, or vice versa. We also use NN 
to denote its length. E[NN] can be computed exactly in polynomial time [23]. As a warmup, we first show 
that E[NN] is a 2-approximation of E[CC] in the following lemma. 

'if we require each cycle consist at least three nodes, the problem is still poly-time solvable by a reduction to minimum perfect 
matching by Tutte [31]. Hartvigsen [21] obtained a polynomial time algorithm for minimum cycle cover with each cycle having 
at least 4 nodes Cornuejols and Pulleyblank [13] have reported that Papadimitriou showed the NP-completeness of minimum cycle 
cover with each cycle having at least 6 nodes. 
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Lemma 10 E[NN] < E[CC] < 2E[NN]. 



Proof: We show NN < CC < 2NN for each possible realization. We prove the first inequality. For each node 
u, there are two edges incident on u. Suppose they are e u i and e U 2- We have CC = ^^ d ( e ^) +d ( e ^)) > NN. 
The second inequality can be seen by doubling all edges in N N and the triangle inequality. □ 

We denote the longest edge in NN (and also its length) by L. Note that L is also a random variable. By 
the law of total expectation, we estimate E[CC] based on the following formula: 

E[CC] = Yl Pr [ L = e l • E [ CC I L = e l 

eeVxV 

It is obvious to see that — < L < NN. Combined with Lemma 10, we have that 

n — — 

d(e) < E[CC | L = e] < 2nd(e). (1) 

However, it is not clear to us how to estimate Pr[L = e] and how to take samples conditioning on event 
L = e efficiently. To circumvent the difficulty, we consider some simpler events and condition L = e on 
those simpler events. Consider a particular edge e = (s, t) € V x V. Denote N s (t) as the event that the 
nearest neighbor of s is t and N t (s) as the event that the nearest neighbor of t is s. Let L st be the event the 
longest edge L in NN is e(s, t). Let A s (t) = N s (t) A L st . First we write 

E[CC | L = e] • Pr[L = e] =E[CC | A a (t) V A t (s)] • Pv[A a (t) V A t (a)] 

=E[CC | A s (t)} • Pr[A a (t)] + E[CC | A t (s)} • Pr[A t (s)} 
- E[CC | A s (t) A A t (s)] ■ Pv[A s {t) A A t (s)} 

Now, we show how to estimate E[CC | A s (t)] • Pr[^4 s (t)] for each edge e(s,t). The other two terms can 
be estimated in the same way. Also notice that the third term is less than the first and the second terms. 
Therefore, for any points s and t, we have the following fact which will be useful later: 

E[CC] > E[CC | L = e] • Pr[L = e] > E[CC | A s (t)\ ■ Pr[A s (t)]. (2) 

Moreover, we have that 

E[CC | A,(t)] ■ Pr[A s (t)} = E[CC | A s (t)\ ■ Pr[L st A N s (t)\ = E[CC | A,(t)] ■ Pr[L st \ N s (t)\ ■ Pv[N s (t)\ 

Suppose v is the only point that may realize to s and u is the only point that may realize to t. We use B as a 
shorthand notation for B(s, d(s, t)). We first observe that Pr[JV s (t)] can be computed exactly in poly-time 
as follows: 

Pr[N s (t)\ = p vs ■ p u t ■ Yl (l-^(B)) 

Also note that we can take samples conditioning on the event N s (t) (the corresponding probability distribu- 
tion for node v is: Pr[v \= r \ N s (t)] = 1 _^ (B) ). 

Estimating E[CC | A s (t)] ■ Pv[L st \ N s (t)\ : Next, we show how to estimate E[CC | A s (t)\ ■ Pr[L st | 
N s (t)]. The high level idea is the following. We take samples conditioning on N s (t). If Pr[L st \ N s (t)] is 
large (i.e., at least l/poly(nm)), we can get enough samples satisfying L st , thus A s (t). Therefore, we can 
get (1 ± e)-approximation for both Pv[L s t \ N s (t)] and E[CC | A s (t)] in poly-time (we also use the fact that 
if A s (t) is true, CC is at least d(s, t) and at most nd(s, t)). However, if Pv[L st \ N a (t)] is small, it is not 
clear how to obtain a reasonable estimate of this value. In this case, we show the contribution of the term to 
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our final answer is extremely small and even an inaccurate estimation of the term will not affect our answer 
in any significant way with high probability. 

2 3 

Now, we elaborate the details. We iterate the following steps for N times (N = 0{ n ™ {Inn + Inm)) 
suffices). 

• Suppose we are in the ith iteration. We take a sample Gi of the stochastic graph conditioning on the 
event N s (t). We compute the nearest neighbor graph NN(Gj) and the minimum length cycle cover 
CC(Gi) of Gi. If e(s, t) is the longest edge in NN(Gj), let I { = 1. Otherwise Ij = 0. 

Our estimate of E[CC | A s (t) } ■ Pi[L st \ N s (t)] is the following: 



T N I 



N / N 



It is easy to see the expectation of ^' =1 J j v cc(G '' :> i s exactly E[ CC | A s (t) ] ■ Pr[L st \ N s (t)]. 
We distinguish the following two cases: 

1. Pr[L st | N,(t)] > By Lemma 1, E (l±e)Pr[L st | N 8 (t)] with high probability. More- 
over, we can get YltLi h — &y^z(lnn + Inm) J with high probability. In this case, we have enough 

successful samples (samples with = 1) to guarantee that — is a (1 ± e)-approximation 

of E[ CC | A s (t) ] with high probability, again by Lemma 1. We note that under condition A s (t), we 
have a nice instance since CC is at least d(s, t) and at most nd(s, t). 

2. Pv[L st | N s (t)] < » £ tj . We note that U = means that while N s {t) happens, the longest edge L 
in NN is longer than e(s,t). Suppose e(s',t r ) is the edge with the maximum Pr[L s ' t '\N s {t)]. Since 
Pr[L st | N s (t)} < e(s',t') must be different from e(s,t) andPr[L s , t , | N s (t)] > ^Pv[L st \ 
N s (t)]. Hence, we have that 

E[CC | A s (t)\ ■ Pr[A s (t)\ = E[CC | A s (t)\ ■ Pr[L st | N s (t)] ■ Pr[N s (t)\ 

< n • d(s,t) • ^2 • Pr I L ^' I N.{t)] • Pr[JV,(t)] 

< ^ • d ( s '^') • Pl i L s't' I N s (t)] • Pv[N s (t)} 
<^ -E[CC | A s , (*')]■ Pt[Lm] 

< -At -EfCCl 

The first and third inequalities are due to (1) and the fourth are due to (2). By Chernoff Bound, we 
have that 

^7V 



Pr 



e " 
< — 



N ~ m? 

Then, with probability at least 1 — poly(^), the contribution from all such edges is less than eE[CC]. 

In summary, we obtain the following theorem. 

Theorem 3 There is an FPRASfor estimating the expected length of the minimum cycle cover in a stochastic 
graph. 
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Finally, we remark that our algorithm also works in presence of both locational uncertainty and node 
uncertainty, i.e., the existence of each node is a Bernoulli random variable. It is not hard to extend our 
technique to handle the case where each cycle is required to contain at least three nodes. This is done 
by considering the longest edge in the 2NN graph (each vertex connects to the nearest and second nearest 
neighbors). The extension is fairly straightforward and we omit the details here. 

5 Conclusion 

We obtain FPRAS the problems of computing the expected lengths of the minimum spanning tree, the 
minimum perfect matching and the minimum cycle cover on a stochastic graph where the location of each 
node is a random point in a given metric space. Our results for the stochastic minimum perfect matching 
and the stochastic minimum cycle cover are the first known algorithms. 

An interesting open problem is the problem of estimating the expected value of the minimum cost 
matching of a certain cardinality (instead of the perfect matching). It is not clear how to extend our technique 
to handle this problem. There are some other important combinatorial optimization problems that have not 
been studied in this model, such as the 6-matching and the Euclidean /c-median problem (the deterministic 
version admits a PTAS in Euclidean spaces [5, 25]). 

It is also interesting to study problems for which the deterministic version is APX-hard. In such cases, 
it is not possible to obtain FPRAS and the best ratio we can hope for is the best approximation ratio we can 
obtain for the deterministic version of the problem. 
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A Another FPRAS for MST 

W.l.o.g., we assume that for each point, there is only one node that may realize to this point. Our algorithm 
is a slight generalization of the one proposed in [23]. Let E[i] be the expected MST length conditioned on 
the event that all nodes {s\, . . . , s n } are realized to points in {u{, . . . , u m } (denote the event by \n(i, m)). 
Let E'[z] be the expected MST length conditioned on the event that all nodes {si, . . . , s n } are realized to 
{ui, . . . , u rn } and at least one node is realized to Uj. We use s 1= u to denote the event that node s is realized 
to point u. It is easy to see that 

E[t] = E'[z]Pr[3s, s\=m\ ln(t, m)} + E[i + ljPr^s, s 1= m \ \n(i, m)] 

For a particular point u^, we reorder the points {m, . . . , u m } as {uj = n, . . . , r m } in increasing order 
of distance from U{. Let E'[z, j] be the expected MST length for all nodes conditioned on the event that all 
nodes are realized to {r^, . . . , rj} (denoted as \n'(i, j)) and 3s, s \= U{. Let E"[i, j] be the expected MST 
length for all nodes conditioned on the event \n'(i, j) A (3s, s t= u^) A (3s', s' 1= rj). We can see that 

E'[i,j] =E"[i,j]Pr[3s',s' N Uj \ \n'(i,j),3s, s \= m) 

+ E'[i,j-l]Pr[^a,aNui | \n'(i,j),3s,s\= Ui ] 

It is not difficult to see the probability Pr[3s', s' \= Uj | ln'(i, j), 3s, s \= u$] can be computed in polynomial 
time. Here we use the assumption that for each point, only one node that may realize to it. Moreover, we 
can also take samples conditioning on event \n'(i,j) A (3s, s f= ui) A (3s', s' \= rj). Therefore E"[i,j] can 
be approximated within a factor of (1 ± e) using the Monte Carlo method in polynomial time since it is a 
nice instance. The number of samples needed can be bounded by a polynomial. 

We can easily generalize the above algorithm to the case where Y^T=iPij — ^> i- e -' n °de i may not be 
present with some probability. Indeed, this can be done by generalizing the definition of \n(i,j) (and simi- 
larly \n'(i, j)) to be the event that each nodes is either not present or realized to some node in {r*, . . . , rj}. 
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